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PATIENT DATA MINING FOR CLINICAL TRIALS 

Cross Reference to Related Applications 

This application claims the benefit of U.S. Provisional Application Serial No. 
60/335,542, filed on November 2, 2001, which is incorporated by reference herein in 
its entirety. 

Field of the Invention 

The present invention relates to medical information processing systems* and, 
more particularly to a computerized system and method for selecting persons for 
clinical trials. 

Background of the Invention 

Selection of persons for clinical trials is an expensive process. It is estimated 
that it costs drug companies several thousand dollars for each participant selected. 
Furthermore, sometimes even after being selected, persons must be dropped from a 
trial because of inaccurate or incorrect information. This may delay the trial, causing 
an even greater expense. 

Although drug companies try to get the word out by placing advertisements 
or through direct contact with physicians, the selection process is generally quite 
inefficient. Physicians tend to be busy and do not always have time to respond to 
requests for patients, and patients may not see the advertisements for clinical trials or 
subscribe to the periodicals where they are placed. 
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Moreover, physicians at a specialized medical center tend to refer patients to 
trials sponsored at that center. Many physicians are unaware of all the available 
clinical trials because of the time it takes to keep current on all available trials for 
every patient that the physician sees. 

In addition, clinical trials often call for very specific selection criteria and it may 
be difficult to ascertain if a particular person qualifies for a trial. Furthermore, 
because hospitals typically store information in an unstructured manner, it may be 
impossible using hospital records to select patients qualifying for particular clinical 
trials. 

An equally important problem is that of matching clinical trials to specific 
patients. For example, for cancer alone, at any point in time there are over 600 trials 
in progress. Statistics show that clinical trial web sites total 75,000 hits every week, 
mostly from patients seeking information about trials, who are trying to fet added to 
a trial. Estimates from National Cancer Institute indicate that only two percent of 
those patients eligible for a trial are in a trial. Thus, it is critically important for an 
individual to know if he or she may be eligible for a trial. 

Given the importance and expense of selecting qualified persons for clinical 
trials, it would be desirable and highly advantageous to provide improved techniques 
for automatically selecting prospective participants for clinical trials. 

Summary of the Invention 

The present invention provides a technique for selecting prospective 
participants in a clinical trial. 

2 
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In various embodiments of the present invention, a method is provided that 
includes receiving a request for a list of prospective participants meeting specified 
criteria for a clinical trial. A set of patient records is then retrieved to determine 
persons meeting the specified criteria. 

The specified criteria may include probability information, thus allowing the 
selection of patients likely to meet the specified criteria for the clinical study (e.g., 1 
90% likelihood of diabetes, 70% likelihood of hypertension). In this case, the relevant 
patient records would include probabilistic information to allow for such selection. 
Additional information for each prospective participant may also be retrieved. This 
additional information may include information about other clinical trials that the 
person participated in, including whether a placebo was administered. 

Furthermore, persons may still be selected even though not all information 
needed to determine whether a person qualifies in all respects for a clinical trial is 
present. 

Consent to participate in a clinical trial should be obtained. A list of persons 
for whom consent was obtained can be outputted and forwarded to an entity 
interested in performing the clinical trial. Typically, this is a drug company. 
Physicians may be notified of their Institutional Review Board (IRB) statuses (e.g., 
'approved', 'pending', or 'not approved'. Expiration dates of their status may be 
forwarded to approved physicians. 

Because patient confidentiality is important, the anonymity of a person 
meeting the specified criteria must be preserved. The process of obtaining consent 
may include selecting physicians associated with the persons meeting the specified 
criteria, requesting approval to participate from each of the selected physicians, and 
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providing consent information to persons meeting the specified criteria whose 
physician provided approval to participate in the clinical trial. 

To further facilitate the process, questionnaires may be provided. These 
questionnaires may be used to ascertain qualifications for the clinical trial. 

Additionally, compensation and fees can be determined for the parties 
involved. For example, participating physicians may be compensated. The entity 
requesting the list may be charged a fee. The patients participating in the clinical trial 
may also be compensated. 

The data source used to determine the persons eligible for the clinical trial may 
include a data warehouse. Further, it may be populated with structured information 
obtained from mining unstructured patient records. The patient records may include 
patient information obtained from a plurality of participating health care providers, 
such as hospitals. 

In various alternative embodiments of the present invention, a system for 
selecting prospective clinical trials for an individual patient is provided. The system 
includes a clinical trials database, a data source containing patient information, and a 
clinical trials brokerage for generating a list of clinical trials for patients meeting 
specified criteria associated with the clinical trials. At least some of the information 
in the data source containing patient information may be obtained from mining 
unstructured patient records. 

These and other aspects, features and advantages of the present invention will 
become apparent from the following detailed description of preferred embodiments, 
which is to be read in connection with the accompanying drawings. 
Brief Description of the Drawings 
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FIG. 1 is a block diagram of a computer processing system to which the 
present invention may be applied according to an embodiment of the present 
invention; 

FIG. 2 shows an exemplary clinical trials brokerage system according to an 
embodiment of the present invention; 

FIG. 3 shows an exemplary clinical trials brokerage system according to 
another embodiment of the present invention; and 

FIG. 4 shows a flow diagram outlining an exemplary technique for selecting a 
person for a clinical trial according to an embodiment of the present invention. 
Description of Preferred Embodiments 

To facilitate a clear understanding of the present invention, illustrative 
examples are provided herein which describe certain aspects of the invention. 
However, it is to be appreciated that these illustrations are not meant to limit the 
scope of the invention, and are provided herein to illustrate certain concepts 
associated with the invention. 

It is also to be understood that the present invention may be implemented in 
various forms of hardware, software, firmware, special purpose processors, or a 
combination thereof. Preferably, the present invention is implemented in software as 
a program tangibly embodied on a program storage device. The program may be 
uploaded to, and executed by, a machine comprising any suitable architecture. 
Preferably, the machine is implemented on a computer platform having hardware 
such as one or more central processing units (CPU), a random access memory (RAM), 
and input/output (I/O) interface(s). The computer platform also includes an operating 
system and microinstruction code. The various processes and functions described 
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herein may either be part of the microinstruction code or part of the program (or 
combination thereof) which is executed via the operating system. In addition, 
various other peripheral devices may be connected to the computer platform such as 
an additional data storage device and a printing device. 

It is to be understood that, because some of the constituent system 
components and method steps depicted in the accompanying figures are preferably 
implemented in software, the actual connections between the system components 
(or the process steps) may differ depending upon the manner in which the present 
invention is programmed. 

FIG. 1 is a block diagram of a computer processing system 100 to which the 
present invention may be applied according to an embodiment of the present 
invention. The system 100 includes at least one processor (hereinafter processor) 
102 operatively coupled to other components via a system bus 104. A read-only 
memory (ROM) 106, a random access memory (RAM) 108, an I/O interface 110, a 
network interface 1 1 2, and external storage 1 1 4 are operatively coupled to the 
system bus 104. Various peripheral devices such as, for example, a display device, a 
disk storage device(e.g., a magnetic or optical disk storage device), a keyboard, and a 
mouse, may be operatively coupled to the system bus 104 by the I/O interface 1 10 or 
the network interface 1 1 2. 

The computer system 1 00 may be a standalone system or be linked to a 
network via the network interface 1 1 2. The network interface 1 1 2 may be a hard- 
wired interface. However, in various exemplary embodiments, the network interface 
1 12 can include any device suitable to transmit information to and from another 
device, such as a universal asynchronous receiver/transmitter (UART), a parallel digital 
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interface, a software interface or any combination of known or later developed 
software and hardware. The network interface may be linked to various types of 
networks, including a local area network (LAN), a wide area network (WAN), an 
intranet, a virtual private network (VPN), and the Internet. 

The external storage 114 may be implemented using a database management 
system (DBMS) managed by the processor 102 and residing on a memory such as a 
hard disk. However, it should be appreciated that the external storage 114 may be 
implemented on one or more additional computer systems. For example, the 
external storage 114 may include a data warehouse system residing on a separate 
computer system. 

Those skilled in the art will appreciate that other alternative computing 
environments may be used without departing from the spirit and scope of the 
present invention. 

Referring to FIG. 2, a clinical trials brokerage 250 is illustrated. The clinical 
trials brokerage 250 is shown operatively connected to a data repository which 
contains patient information typically collected from one or more health care 
organization, such as hospitals. This data repository is called a structured clinical 
patient record (CPR) 280. In various embodiments of the present invention, a 
plurality of drug companies, such as drug company 210, request lists of persons 
meeting specified criteria for clinical trials. The structured CPR 280 is then consulted 
to obtain the lists of persons meeting the specified criteria. 

The specified criteria may include probability information, thus allowing the 
selection of patients likely to meet the specified criteria for the clinical study (e.g., 
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90% likelihood of diabetes, 70% likelihood of hypertension). In this case, the relevant 
patient records would include probabilistic information. 

Furthermore, persons may still be selected even though not all information 
needed to determine whether a patient qualifies in all respects for a clinical trial is 
present. In this case, the list would include "persons of interest" some of whom 
might later be excluded from participating in the clinical trial for various reasons. 
Information about each person meeting the selection may additionally be provided, 
including information about other clinical trials that the person participated in and 
whether a placebo was administered. 

The system may keep track of a plurality of clinical trials, and maintain a list of 
person who were administered a placebo instead of the drug being tested. In many 
cases, a person is disqualified from a trial if he or she participated in a trial for a 
similar drug; however, if it is determined that a placebo was administered, the system 
may be configured to not exclude the person. In other cases, the system would 
provide information about the trial(s) that the person participated in. 

A physician, such as physician 230, may be contacted if one of their patients 
meets the specified criteria for a clinical trial. Prior to releasing information to a drug 
company, it is generally necessary to obtain agreement of the patient's physician and 
an informed consent of the patient to participate in the trial. For example, the 
physician 230 may recommend to a patient that a clinical trial being conducted by 
the drug company 21 0 would be beneficial. The details of the trial may have been 
forwarded to the physician 230. Furthermore, physicians may be notified of their 
Institutional Review Board (IRB) statuses (e.g., 'approved', 'pending', or 'not approved'. 
Expiration dates of their status may be forwarded to approved physicians. 
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The clinical trials brokerage 250 can be notified that the patient provided an 
intent to participate. When the necessary informed consent information is obtained, 
the clinical trials brokerage 250 can provide the identity of the patient (and other 
patient information) to the drug company 210. 

Preferably, the structured CPR 280 is populated with patient information using 
data mining techniques described in "Patient Data Mining," by Rao et al.. Attorney 
Docket No. 2001P20906US01, copending U.S. Patent Application Serial No. 
10/ , , filed herewith, which is incorporated by reference herein in its entirety. 

That disclosure teaches a data mining framework for mining high-quality 
structured clinical information. The data mining framework includes a data miner 
that mines medical information from a computerized patient record based on 
domain-specific knowledge contained in a knowledge base. The data miner includes 
components for extracting information from the computerized patient record, 
combining all available evidence in a principled fashion overtime, and drawing 
inferences from this combination process. The mined medical information is stored in 
a structured computerized patient record. 

To determine the specified criteria for the clinical study, multiple data sources 
typically need to be consulted. For example, to determine whether the patient is 
diabetic, the system might have to examine the following information: 

(a) ICD-9 billing codes for secondary diagnoses associated with diabetes; 

(b) drugs administered to the patient that are associated with the treatment of 
diabetes (e.g., insulin); 

(c) patient's lab values that are diagnostic of diabetes (e.g., two successive 
blood sugar readings over 250 mg/d); 

9 



BNSDOCID: <WO 03040878A2_I_> 



(d) doctor mentions that the patient is a diabetic in 
the H&P (history & physical) or discharge note (free text); and 

(e) patient procedures (e.g., foot exam) associated with being a diabetic. 
As can be seen, there are multiple independent sources of information, 

observations from which can support (with 
varying degrees of certainty) that the patient is diabetic 
(or more generally has some. disease / condition). Not 
all of them may be present, and in fact, in some cases, 
they may contradict each other. Probabilistic observations 
can be derived, with varying degrees of confidence. 
Then these observations (e.g., about the billing codes, the 

drugs, the lab tests, etc.) may be probabilistically combined to come up with a final 
probability of diabetes. 

Note that there may be information in the patient record 

that contradicts diabetes. For instance, the patient is ^ 

i 

has some stressful episode (e.g., an operation) and his 

i 

blood sugar does not go up. 

It should be appreciated that the selection of patients for clinical trials may be 
based on probabilistic information. Thus, a list of patients that meet the specified 
criteria may comprise a list of patients likely (e.g., according to a particular degree of 
confidence) to have met the criteria for the clinical trial. 

Since it may be necessary to obtain additional information or to verify 
information about a participant, the clinical trials brokerage 250 may output, or 
otherwise provide, questionnaires. These questionnaires may be used to ascertain 
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qualifications for the clinical trial. For example, the patient may be asked to provide a 
detailed family history of particular diseases. 

In addition to providing a list of persons meeting the specified criteria, the 
clinical trials brokerage 250 may also calculate various charges and fees. For 
example, participating physicians may need to be compensated. The drug company 
may be charged a fee for the list. Additionally, participants in the clinical trial may 
also be compensated. 

In various embodiments of the present invention, lists of persons who are pre- 
qualified for certain types of clinical trials may be generated. These lists of pre- 
qualified individuals may be made available to drug companies or other entities 
interested in conducting a clinical trial. 

Referring to FIG. 3, an alternate embodiment of the present invention is 
illustrated. In this embodiment, a clinical trials brokerage 350 is able to access a 
structured CPR 380 containing mined structured patient information, and also a 
clinical trials database 390 containing information about various clinical trials. The 
information in the clinical trials database 390 may include information regarding the 
qualifications for clinical trials along with other information regarding the trials. A 
patient, such as patient 335, may request information about a particular clinical trial. 
The patient may either directly access the clinical trials brokerage 350 or go through a 
physician, such as physician 330. The clinical trials brokerage 330 may access the 
structured CPR 380 (populated with information in the same manner as the CPR 280) 
to retrieve information about the patient, and attempt to match clinical trials of 
interest to the patient based on the medical history of the patient and available trials. 

li 
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Referring to FIG. 4, a flow diagram outlining an exemplary technique for 
selecting a person for a clinical trial is illustrated. Beginning at step 401 , a person is 
selected from among a set of persons meeting specified criteria. This step may 
include receiving a request for a list of persons meeting specified criteria for a clinical 
trial, and retrieving a set of patient records from a data source to determine persons 
meeting the specified criteria. 

For example, a drug company might be interested in selecting black males 
who are diabetic and have had a heart attack within the last three years. This might 
be used to test a new drug. 

Using conventional approaches, satisfying the above-mentioned selection 
criteria could be difficult because computerized hospital databases generally do not 
store such information. However, by employing the data mining techniques 
described in "Patient Data Mining," by Rao et al., Attorney Docket No. 

2001P20906US01, copending U.S. Patent Application Serial No. 10/ , , filed 

herewith, a structured CPR can be populated with such patient information, thus 
allowing this selection criteria to be satisfied. 

In step 402, the person's physician can be notified 
that the person has been selected for the clinical trial. At this point, a hospital's 
Institutional Review Board (IRB) can also be notified. The physician can also be 
notified if IRB approval has already been granted for this trial at this site, or if he 
needs to wait for the IRB approval for this trial. Next, in step 403, a determination is 
made as to whether the physician will participate in the study. If it is determined that 
the physician will participate, control continues to step 404; otherwise control 
terminates at step 408. 

12 
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In step 404, the person is notified that he or she may qualify for the clinical 
trial. The patient can be directly contacted, or, indirectly contacted through a 
physician. At this point, the patient may be given detailed information about the 
clinical trial. The patient may be asked for additional information, such as through a 
questionnaire. The questionnaire may be used to determine qualification for the 
study and/or as a way to obtain additional useful information. 

Next, in step 405, a determination is made as to whether the person indicated 
a desire to participate in the clinical trial. If the person notified his or her physician of 
an intent to participate, control continues to step 406; otherwise control terminates 
at step 408. 

In step 406, release information is obtained. At this point the person may be 
provided with a consent form or be directed to complete one provided to him by his 
or her physician. Any information regarding participant compensation, including 
reimbursements, may also be provided. Control continues to step 407. 

In step 407, fees and charges may be determined. For instance, the entity 
requesting the list of patients may be charged an appropriate fee for the list of 
patients. Furthermore, the physician and trial participants may also be compensated 
for their participation in the study. Control continues to step 408 where the 
operation stops. 

As shown in FIGs. 1-4, this invention is preferably implemented using a 
general purpose computer system. However the systems and methods of this 
invention can be implemented using any combination of one or more programmed 
general purpose computers, programmed microprocessors or micro-controllers and 
peripheral integrated circuit elements, ASIC or other integrated circuits, digital signal 
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processors, hardwired electronic or logic circuits such as discrete element circuits, 
programmable logic devices such as a PLD, PLA, FPGA or PAL, or the like. In general, 
any device capable of implementing a finite state machine that is in turn capable of 
implementing the flowchart shown in FIG. 4 can be used to implement this system. 

Although illustrative embodiments of the present invention have been 
described herein with reference to the accompanying drawings, it is to be understood 
that the invention is not limited to those precise embodiments, and that various 
other changes and modifications may be affected therein by one skilled in the art 
without departing from the scope or spirit of the invention. 
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WHAT IS CLAIMED IS: 

1 . A system for selecting prospective participants in a clinical trial, comprising: 
a data source containing patient information, at least some of the patient 

information obtained from mining unstructured patient records; and 

a clinical trials brokerage for retrieving a set of patient records from the data 
source and generating a list of persons who meet specified criteria associated with 
the clinical trial. 

2. The system of claim 1 , wherein the clinical trials brokerage is configured to 
obtain consent from one or more the person meeting the specified criteria. 

3. The system of claim 1 , wherein the list of persons meeting the specified 
criteria is requested from an entity interested in performing the clinical trial. 

4. The system of claim 1 , wherein the anonymity of the persons meeting the 
specified criteria is preserved until consent is provided. 

5. The system of claim 1 , wherein the list of persons meeting specified criteria 
includes persons pre-qualified for the clinical trial. 

6. The system of claim 1 , wherein the data source includes information collected 
from a plurality of hospitals. 
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7. The system of claim 1, wherein the specified criteria includes probability 
criteria. 

8. The system of claim 1 , wherein the obtained patient records include 
probabilistic information. 

9. The system of claim 1 , wherein information needed to determine whether a 
patient qualifies in all respects is not included in the obtained patient records. 

1 0. The system of claim 1 , wherein information about each person in the list is 
provided. 

1 1 . The system of claim 1 0, wherein the information includes information 
regarding previous clinical trials that the person participated in. 

12. A method for selecting prospective participants in a clinical trial, comprising 
the steps of: 

receiving a request for a list of persons meeting specified criteria associated 
with a clinical trial; and 

retrieving a set of patient records from a data source to determine persons 
meeting the specified criteria. 

13. The method of claim 1 2, further comprising the steps of: 
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obtaining consent to participate in the clinical trial from one or more of the 
persons meeting the specified criteria; and 

outputting a list of persons from whom consent was obtained. 

1 4. The method of claim 1 3, further including the step of forwarding the list of 
persons to an entity interested in performing the clinical trial. 

1 5. The method of claim 1 3, wherein the step of obtaining consent comprises the 
steps of: 

selecting physicians associated with the persons meeting the specified criteria; 
requesting approval to participate from each of the selected physicians; and 
providing consent information to persons meeting the specified criteria if their 
physician provided approval to participate in the clinical trial. 

1 6. The method of claim 1 3, wherein obtaining consent includes notifying 
physicians of their Institutional Review Board (IRB) statuses. | 

1 7. The method of claim 1 6, wherein obtaining consent further includes 
forwarding to accepted status physicians expiration dates of their IRB approvals. 

1 8. The method of claim 1 4, wherein the request for the list of persons is received 
from the entity interested in performing the clinical trial. 
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1 9. The method of claim 1 2, wherein the anonymity of the persons meeting the 
specified criteria is preserved until consent is provided. 

20. The method of claim 1 2, further comprising the step of providing 
questionnaires. 

21 . The method of claim 20, wherein the questionnaires are used to ascertain 
qualification for the clinical trial. 

22. The method of claim 14, wherein the entity requesting the list of patients is 
charged a fee for the list of patients. 

23. The method of claim 14, wherein persons participating in the clinical trial are 
compensated. 

24. The method of claim 1 5, wherein a participating physician is compensated. 

25. The method of claim 1 2, wherein the data source is a data warehouse. 

26. The method of claim 25, wherein the data warehouse is populated with 
structured patient information obtained from mining unstructured patient records. 

27. The method of claim 1 2, wherein the request is received from a drug 
company. 

18 
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28. The method of claim 1 2, wherein the data source includes information 
collected from a plurality of hospitals. 

29. The method of claim 1 2, wherein the specified criteria includes a probability 
value. 

30. The method of claim 29, wherein the probability value includes a confidence 
interval. 

31 . The method of claim 1 2 # wherein the obtained patient records include 
probabilistic information. 

32. The method of claim 1 2, wherein information needed to determine whether a 
patient qualifies in all respects is not included in the obtained patient records. 

33. The method of claim 1 2, wherein information about each person in the list is 
generated. 

34. The method of claim 12, wherein the information includes information 
regarding previous clinical trials that the person participated in. 
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33. A program storage device readable by a machine, tangibly embodying a 
program of instructions executable on the machine to perform method steps for 
selecting prospective participants in a clinical trial, the method steps comprising: 

receiving a request for a list of persons meeting specified criteria associated 
with a clinical trial; and 

retrieving a set of patient records from a data source to determine persons 
meeting the specified criteria. 

34. A system for selecting prospective clinical trials for an individual patient, 
comprising: 

a clinical trials database; - 

a data source containing patient information; and 

a clinical trials brokerage for generating a list of clinical trials for patients 
meeting specified criteria associated with the clinical trials. 

35. The system of claim 34, wherein at least some of the information in the data 
source containing patient information is obtained from mining unstructured patient 
records. 
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